{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "extraordinary-world",
   "metadata": {},
   "source": [
    "# Ungraded Lab: Gradient Descent for Logistic Regression\n",
    "\n",
    "In this lab, you will implement the gradient descent update step for logistic regression.\n",
    "\n",
    "## Dataset \n",
    "Let's start with the same dataset as before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "classical-heather",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])\n",
    "y = np.array([0, 0, 0, 1, 1, 1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "suited-florist",
   "metadata": {},
   "source": [
    "As before, we'll use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as black circles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "brazilian-volunteer",
   "metadata": {},
   "outputs": [],
   "source": [
    "from lab_utils import plot_data\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plot_data(X,y)\n",
    "\n",
    "# Set both axes to be from 0-6\n",
    "plt.axis([0, 6, 0, 6])\n",
    "# Set the y-axis label\n",
    "plt.ylabel('$x_2$')\n",
    "# Set the x-axis label\n",
    "plt.xlabel('$x_1$')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "physical-least",
   "metadata": {},
   "source": [
    "## Gradient descent\n",
    "\n",
    " First, you will implement a non-vectorized version of the gradient. Then, you will implement a vectorized version.\n",
    "\n",
    "\n",
    "### Non- vectorized version\n",
    "\n",
    "Recall that in gradient descent, each iteration performs the update:\n",
    "\n",
    "$$w_j :=w_j - \\alpha \\frac{\\partial J(w_j)}{\\partial w}$$ \n",
    "\n",
    "simultaneously update $w_j$ for all $j$, where\n",
    "\n",
    "$$ \\frac{\\partial J(w)}{\\partial w_j}  = \\frac{1}{m} \\sum\\limits_{i = 1}^{m} (f_{w}(x^{(i)}) - y^{(i)})x_j^{(i)}$$ \n",
    "\n",
    "- **Note**: While this gradient looks identical to the linear regression gradient, the formula is actually different because linear and logistic regression have different definitions of $f_w(x)$.\n",
    "\n",
    "You'll implement $\\frac{\\partial J(w_j)}{\\partial w}$ in this lab. \n",
    "* m is the number of training examples in the dataset\n",
    "\n",
    "    \n",
    "*  $f_{w}(x^{(i)})$ is the model's prediction, while $y^{(i)}$, which is the actual label\n",
    "\n",
    "* For a logistic regression model for the dataset given above, the model can be representented as:\n",
    "\n",
    "    $f_{w}(x) = g(w_0 + w_1x_1 + w_2x_2)$\n",
    "\n",
    "    where $g(z)$ is the sigmoid function:\n",
    "\n",
    "    $g(z) = \\frac{1}{1+e^{-z}}$ \n",
    "    \n",
    "    \n",
    "* **Preprocessing step** \n",
    "\n",
    "   For ease of implementation, we will add an additional column of ones to $X$ (as $x_0$) so that  \n",
    "    $f_{w}(x) = g(w_0x_0 + w_1x_1 + w_2x_2)$\n",
    "    \n",
    "    By doing this, to calculate the prediction from the model $f_{w}(x)$, we can write a for loop that calculates $w_jx_j$ at each step"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "earned-description",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add a column to X_orig to account for the w_0 term\n",
    "X_mod = np.hstack([np.ones((X.shape[0],1)), X])\n",
    "\n",
    "print(X_mod)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "postal-giant",
   "metadata": {},
   "source": [
    "#### Side Note: sigmoid function implementation\n",
    "We've implemented the `sigmoid` function for you already and you can simply import and use it, as shown in the code block below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "structured-symbol",
   "metadata": {},
   "outputs": [],
   "source": [
    "from lab_utils import sigmoid \n",
    "\n",
    "print(sigmoid(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "perceived-ladder",
   "metadata": {},
   "source": [
    "#### Implementation\n",
    "\n",
    "Now, you'll implement the non-vectorized version of the gradient. We've already provided some starter code for you which does the following -\n",
    "* We create an array to hold the gradients (called `dw`) with the same shape as $w$ and initialize it with zeros. We will update and return `dw` \n",
    "* There is a for loop to calculate `dw[j]` at each iteration\n",
    "* At each iteration, we can use the gradient formula above, which involves another for loop over all the examples in the dataset\n",
    "* We store the gradient value for each example in a list and the gradient `dw[j]` is then computed as the sum of gradient for each example divided by the number of examples\n",
    "\n",
    "We assume that the function takes in the paramaters $w$ as a list/array.\n",
    "\n",
    "**Exercise**\n",
    "\n",
    "You'll complete the cost function by implementing the following steps inside the inner for loop - \n",
    "\n",
    "* First, you'll compute the models prediction $f(x^{(i)})$ for a single data point at index $i$ as shown below\n",
    "\n",
    "   ```\n",
    "   z = 0\n",
    "   for j in range(n):\n",
    "       z += w[j] * X[i][j]\n",
    "   f = sigmoid(z)\n",
    "   ```\n",
    "   \n",
    "   Since $w_0x_0 + w_1x_1+w_2x_2 = w^T\\cdot x$, you can also calculate  $f(x^{(i)})$ as \n",
    "   ```\n",
    "   z = np.dot(w.T, X[i])\n",
    "   f = sigmoid(z)\n",
    "   ```\n",
    "\n",
    "* Then, you'll compute the gradient for the single data point at index $i$ as \n",
    "\n",
    "  ```\n",
    "  gradient = (f-y[i])*X[i][j] \n",
    "  ```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "offshore-resolution",
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient(X, y, w):\n",
    "    # Here X is assumed to pre-processed with a column of ones added as x_0\n",
    "    m, n = X.shape\n",
    "    dw = np.zeros_like(w)\n",
    "    \n",
    "    for j in range(n):\n",
    "        gradient_list = []\n",
    "        \n",
    "        for i in range(m):        \n",
    "            ### START CODE HERE ### \n",
    "             \n",
    "            ### END CODE HERE ### \n",
    "            gradient_list.append(gradient)\n",
    "        \n",
    "        dw[j] = (1/m)* sum(gradient_list)\n",
    "        \n",
    "    return dw"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "concerned-proxy",
   "metadata": {},
   "outputs": [],
   "source": [
    "Check the implementation of your gradient function using the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "former-atmosphere",
   "metadata": {},
   "outputs": [],
   "source": [
    "w = np.zeros(3)\n",
    "compute_gradient(X_mod,y,w)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "accessible-prison",
   "metadata": {},
   "source": [
    "**Expected output**\n",
    "\n",
    "array([ 0.        , -0.25      , -0.16666667])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "brilliant-semester",
   "metadata": {},
   "source": [
    "### (Optional ) Vectorized version\n",
    "\n",
    "You will now implement a vectorized version of the gradient function.\n",
    "\n",
    "The vectorized version of the gradient formula is \n",
    "\n",
    "$$\\frac{\\partial J(w_j)}{\\partial w} = \\frac{1}{m} X^T (f - y)$$ \n",
    "\n",
    "where\n",
    "\n",
    "$$ f = g(X \\cdot w)$$\n",
    "\n",
    "As before, $g$ is the sigmoid function\n",
    "\n",
    "**Exercise**\n",
    "\n",
    "You'll complete the vectorized cost function by implementing the following steps - \n",
    "\n",
    "* First, you'll compute the models prediction $f(x)$ as shown below\n",
    "\n",
    "   ```\n",
    "   z = np.dot(X, w)\n",
    "   f = sigmoid(z)\n",
    "   ```\n",
    "  \n",
    "\n",
    "* Then, you'll compute the gradient as \n",
    "\n",
    "  ```\n",
    "  dw = (1/m)*np.dot(X.T, (f - y))\n",
    "  ```\n",
    "\n",
    "**Debugging Tip:** Vectorizing code can sometimes be tricky. One common strategy for debugging is to print out the sizes of the matrices you are working with using the size function. For example, given a data matrix $X$ of size 6 × 3 (6 examples, 3 features) and $w$, a vector with dimensions 3x1, you can observe that $Xw$ is a valid multiplication operation, while $wX$ is not."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "limiting-split",
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient_vectorized(X, y, w):\n",
    "    \n",
    "    m, n = X.shape\n",
    "    \n",
    "    ### START CODE HERE ### \n",
    "    \n",
    "    ### END CODE HERE ### \n",
    "    return dw"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "flush-aaron",
   "metadata": {},
   "source": [
    "Now let's check if the output of this function is equivalent to the output of your non-vectorized implementation above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "public-criterion",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Cost computed by non-vectorized version: \", compute_gradient(X_mod, y, w))\n",
    "print(\"Cost computed by vectorized version: \", compute_gradient_vectorized(X_mod, y, w))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "gorgeous-gazette",
   "metadata": {},
   "source": [
    "**Expected output** \n",
    "\n",
    "Cost computed by non-vectorized version:  [ 0.         -0.25       -0.16666667]\n",
    "\n",
    "Cost computed by vectorized version:  [ 0.         -0.25       -0.16666667]\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
